Data Visualization

Cory Whitney
"2019-03-13"

Data visualization: getting stuck

  • open RStudio plot of chunk unnamed-chunk-1

  • Help > Cheatsheets > Data Visualization with ggplot2

  • type ‘?’ in R console with function, package or data name

  • Add “R” to a search with a copy of an error message

  • Many talented programmers who scan the web and answer issues

plot of chunk unnamed-chunk-2 https://stackoverflow.com/

Creating basic plots

R has several systems for making graphs

  • Base R
participants_data <- read.csv("participants_data.csv")
plot(participants_data$academic_parents)

plot of chunk unnamed-chunk-5 Bar plot of number of observations of binary data related to academic parents

plot(participants_data$academic_parents, participants_data$days_to_email_response)

plot of chunk unnamed-chunk-6 Boxplot of days to email response grouped by binary data related to academic parents

Use help '?' for function

?plot

ggplot2: overview

R has several systems for making graphs

  • ggplot2 is one of the most elegant and most versatile. plot of chunk unnamed-chunk-8

  • it implements the grammar of graphics to describe and build graphs.

  • Do more faster by learning one system and applying it in many places.

  • Learn more about ggplot2 in “The Layered Grammar of Graphics”

http://vita.had.co.nz/papers/layered-grammar.pdf

ggplot2: qplot with participant data

plot of chunk unnamed-chunk-9

library(ggplot2)
qplot(days_to_email_response, letters_in_first_name, data = participants_data)

plot of chunk unnamed-chunk-10 Scatterplot of days to email response as a function of the letters in your first name

Use help '?' for function

?qplot

Want to understand how all the pieces fit together? See the R for Data Science book: http://r4ds.had.co.nz/

ggplot2: qplot with built-in data

plot of chunk unnamed-chunk-12

Example from Fisher's iris data set

qplot(Sepal.Length, Petal.Length, data=iris, color=Species, size=Petal.Width)

plot of chunk unnamed-chunk-13 Scatterplot of iris petal length as a function of sepal length with colors representing iris species and petal width as bubble sizes.

Use help '?' for data

?iris

ggplot2: qplot with your data

plot of chunk unnamed-chunk-15

Example from your data

qplot(days_to_email_response, letters_in_first_name, color=academic_parents, size=working_hours_per_day, data=participants_data)

plot of chunk unnamed-chunk-16 Scatterplot of letters in your first name as a function of days to email response with colors representing binary data related to academic parents and working hours per day as bubble sizes.

Make more graphs

Correlation

plot of chunk unnamed-chunk-17


    Pearson's product-moment correlation

data:  participants_data$days_to_email_response and participants_data$letters_in_first_name
t = -0.64191, df = 7, p-value = 0.5414
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.7780668  0.5078670
sample estimates:
       cor 
-0.2357798 

Use help '?' for function

?cor.test

Bonus: gganimate Datasaurus Dozen

plot of chunk unnamed-chunk-20

  • Using the datasauRus, ggplot2 and gganimate libraries.
ggplot(datasaurus_dozen, aes(x=x, y=y))+
  geom_point()+
  theme_minimal() +
  transition_states(dataset, 3, 1) + 
  ease_aes('cubic-in-out')

plot of chunk unnamed-chunk-21

Bonus: gganimate mtcars mpg

  • Using the gifski, ggplot2 and gganimate libraries.
ggplot(mtcars, aes(factor(cyl), mpg)) + 
  geom_boxplot() + 
  geom_point() +
  transition_states(am, transition_length = 4, state_length = 1) + 
  view_follow()

plot of chunk unnamed-chunk-22

ggplot2: geom_tile

  • Using the gifski, ggplot2 and gganimate libraries.

plot of chunk unnamed-chunk-23

  • Check with journal about size, resolution etc.
?pdf
?png

Tasks for the afternoon: Basic

plot of chunk unnamed-chunk-26

  • Check your data for interesting trends and correlations
  • Use scatter plots, barcharts and boxplots
  • Bootstrap and vary the sample and run the same analysis and plots
  • Save your most interesting figure and share it with us the next day

Tasks for the afternoon: Advanced

plot of chunk unnamed-chunk-27

  • Import data from an external source (e.g. FAO, World Bank)
  • Display those data in an interactive plot
  • Play around with the design
  • Export your most interesting figure and share it with us tomorrow

Be prepared for tomorrow

Install Git & Github (if you do not already have them).

Git https://git-scm.com/downloads

Github http://r-pkgs.had.co.nz/git.html

join Github https://github.com/